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GENERIC HANDLING BY A CLIENT OF DIFFERENT MARK UP 
LANGUAGE PARSERS AND GENERATORS 



DESCRIPTION OF THE PRIOR ART 

Maik-up language is a set of codes in a text file that enable a computing device to fotmat 
the text for correct display or printing. A cUent Ci.e. any process diat requests a service ' 
from another process) in a softvvare system creates mark-up language using a 'generator*. 
It reads and interprets mark-up language using a 'parser* . 

In the prior art. parsers and generators have been specific to certain kinds of mark-up 
languages. For example, a dient could use an XML (extensible mark-up language) parser 
to interpret and handle XML files; it could use a separate WBXML (WAP binary XML) 
parser to interpret and handle WBML files. In each case, the dient would talk directly to 
each parser. 

When the dient needs to generate mark-up language format files, there cooald be an XML 
generator and a separate WBXML generator. Again, the dient would talk amty to each 
generator. 

In prior art systems, dients have had to be hard-coded to handle and talk directly with 
these spedfic kinds of parsers and generators; in practice, this has meant that dients are 
either extremely complex (if they need to handle several different -mark up language 
formats) or dse they are restricted to a single mark-up language format. 



SUMMARY OF THE PRESENT INVENTION 

The present invention is a portable computing device programmed with a dient that 
interfaces with a mark up language parser or a generator via an intermediary layer that (a) 
insulates the dient firom having to communicate directiy with the parser or generator and 
is (b) generic in that it presents a common API to the dient irrespective of the spedfic 
kind of parser or generator the intermediary layer interfaces with. 
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In this way, the client is no longer tied to a single kind of parser or generator; it can 
operate with any different kind of parser compatible with the intermediary layer, yet it 
remains fer simpler than prior art clients that are hard-coded to operate direcdy with 
several different kinds of parsers and generators. 

5 The API is typically implemented as a header file. In an implementation, the 
intermediary layer acts as an extensible framework and the parsers and generators are 
themsdves plxig-ins to that firamework. The present invention may hence readily allow 
die device to operate with different kinds of parsers and generators: diis extensibility is 
impossible to achieve with prior art hard-coded systems. 

10 The"speaHc^End of ]^ being used is not known to "the client: the 

intermediary layer fuUy insiolates the client from needing to be aware of these specifics. 
Instead, the client deals only with die intermediary layer, which presents to the client as a 
generic parser or a generic generator — Le. a parser or generator which behaves in a way 
that is common to all parsers or generators. 

15 For example, the SyncML the protocol supports botii XML and WBXML. By using both 
XML and WBXML parser and generator plug-ins in to the framework, a SyncML client 
can use eidier or both type of parser and generator without knowing about the type of 
mark-up language; as a result, the design of the SyncML client is gready simplified. Since 
WBXML and XML are quite different in the way they represent their data, one very 

20 useful feature of die invention is the mapping of WBXML tokens to a string in a static 
string pool table. Appendix B e2q)ands on diis idea. 

The present invention may provide a flexible and extensible file conversion system: for 
example, the device could parse a document written in one mark up language format and 
dien use die parsed document data to generate an equivalent document in a different file 
25 format. Because of the extensible plug-in design of an implementation of the system, it 
is possible to provide far greater kinds of file conversion capabilities than was previously 
the case. New kinds of parsers and generators can be provided for loading onto a device 
after that device has been shipped to an. end-user. The only requirement is that they are 
compatible widi die intermediary layer. 



30 



Generic Parser 



10 



Another advantage of the present invention is that it aEows not only different parsers 
and generators to be readily used by the same dient, but it aUows also several different 
clients to share the same parsers and generators as welL 

The API may itself be extensible, so that extensions to its capabiHties (e.g..to enable a 
new/extended mark-up language of a document to be handled) can be made ^liiout 
affecting compatibiHty ^vith existing dients or existing parsers and generators. This may 
be achieved through an updated/extended namespace plug-in; this plug-in sets-up all the 
elements, attributes and attribute values for a namespace. Similarly, new kinds of dients 
can be provided for loading onto a device after that device has been shipped to an end- 
user. The only requirement is that they are compatible with the intermediary layer. 
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DETAILED DESCRIPTION 
Overview of Key Features 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
5 SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices. 

The Mark-Up Language framework implements three key features. 
Jii— — - Generic Parser API 

Clients are separated from mark-up language parsers/generators by an intermediary layer 
10 that (a) insulates the client from having to communicate direcdy with the parser or 
generator and is (b) generic in that it presents a common API to the client irrespective of 
the specific kind of parser or generator the intermediary layer interfaces with. 

2. Data validation/pre-filtering and altering components in a chain of 
responsibility 

15 Mark-up language parsers or generators can access components to validate, pre-filter or 
alter data; the components are plug-in components that operate using a 'chain of 
responsibility* design pattern. 

3. . Generic Data Supplier API 

The mark-up language parsers or generators can access data from a source using a 
20 generic data supplier API, insulating the parser or generator from having to 
communicate direcdy with the data source. 

Each of this features will now be discussed in more detail. 

r. Generic Parsetrlntermediary Layer 

The essence of this approach is that the client that interfaces with a mark up language 
25 parser or a generator via an intermediary layer that (a) insulates the client from having to 
communicate directly wili the parser or generator and is (b) generic in that it presents a 



common API to the dient krespecttve of the specific kind of parser or generator the 
intermediary layer intarfeces -with. 

In this way, the cUent is no longer tied to a single kind of parser or generator; it can 
operate with any different kind of parser compatible ^vith the intermediary layer, yet it 
remains far simpler ttan prior art dients that are hard-coded to operate directiy with 
several different kinds of parsers and generators. 

The API is typically implemented as a header file. In an implementation, the 
intermediary layer acts as an extensible framework and the parsers and generators are 
themselves plug-ins to diat framework. The present invention may hence readily allow 
the device to operate with different kinds of parsers and generators: this extensibility is 
impossible to achieve with prior art hard-coded systems. 

The specific kind of parser or generator being used is not known to the client: the 
intermediary layer fiJly insulates the client firom needing to be aware of these specifics. 
Instead, the dient deals only with die intermediary layer, which presents to the dient as a 
generic parser or a generic generator - i.e. a parser or generator whidi behaves in a way 
that is common to all parsers or generators. 

For example, the SyncML the protocol supports both XML and WBXML. By using both 
XML and WBXML parser and generator plug-ins in to the firamework, a SyncML dient 
can use dther or both type of parser and generator widiout knowing about the type of 
mark-up language; as a result, the design of the SyndSlL dient is greatly simplified. Since 
WBXML and XML are quite different in the way they represent their data, one very 
usefiil feature of the invention is the mapping of WBXML tokens to a string in a static 
string pool table. Appendix B expands on this idea. 

The present invention may provide a flexible and extensible file conversion system: for 
example, the device could parse a document written in one inark up language format and 
then use the parsed document data to generate an equivalent document in a different file 
format. Because of the extensible plug-in design of an implementation of die system, it 
is possible to provide far greater kinds of file conversion capabiUties than was previously 
the case. New kinds of parsers and generators can be provided for loading onto a device 
after tiiat device has been shipped to an end-user. The only requirement is that they are 
compatible with the intermediary layer. 
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Another advantage of liie present invention is that it allows not only different parsers 
and generators to be readily used by the same client, but it allows also several different 
clients to share the same parsers and generators as well. The API may itself be 
5 extensible, so that extensions to its capabilities (e.g. to enable a new/extended mark-up 
language of a document to be handled) can be made without affecting compatibility with 
pvkring clients or existing parsers and generators. Similarly, new kinds of clients can be 
provided for loading onto a device after tihat device has been shipped to an end-user. 
The only requirement is that daey are compatible with the intermediary layer. 

10 

Z Data validation/pre-filtering and altering components in a chain of 
responsibility ^ 

The essence of this approach is that die mark-up language parser or generator can access 
components to validate, pre-filter or alter data, in which die components are plug-in 
15 components tiiat operate using a chain of responsibility. 

Because of the plug-in design of the components, the system is inherentiy flexible and 
extensible compared with prior art systems in which a component (for validating, pre- 
filtering or altering data from a parser or generator) would be tied exclusively to a given 
parser. Hence, if a mark up language of a document is extended, or a new one created, it 
20 is possible to write any new validation/pre-filter/altering plug-in that is needed to work 
with the extended or new language. These new kinds of validation/pre-filter/ altering 
plug-ins can be provided for loading onto a device even after tiiat device has been 
shipped to an end-user. The 'chain of responsibility* design pattem, whilst known in 
object oriented programming, has not previsouly been used in the present context. 

25 The plug-in components may all present a common, generic API to the parser and 
generator. Hence, the same plug-in can be used witii different types of parsers and 
generators (e.g. a XML parser, a WBXML parser, a RTF parser etc.). The plug-ins ^so 
present a common, generic API to a client component using the parser or generator. 
Hence, the same plug-ins can be used by different clients. 

30 For example a DTD validator plug-ih could be written that validates the mark-up of a 
document and can report errors to the client Or for a web browser an auto correction 



plug-in filter could be written that tries to correct errors found in the mark-up language, 
such as a missing end element tag, or a incorrectly placed element tag. The auto 
correction plug-in will, if it can, fix the error transparently to the cUent This enables a 
web browser to still display a document rather then just displaying an error reporting tiiat 
there was an error in the document 

Because, the plug-ins can be chained together, complex and different type of filtering and 
vdidation can take place. In the example above die parser could notify the validator plug- 
in of elements it is parsing and these in turn would go to tiae auto correction plug-in to 
be fixed if required and finally the client would receive lliese events. 

The mark-up firamework allows parser plug-ins to expose the parsed element stack to all 
vaHdation/pre-filter/altering plug-ins. (The parsed element stack is a stack populated 
with elements firom a document extracted as that document is parsed; this stack is made 
available to aU vaUdation/pre-filter/altering plug-ins to avoid the need to dupHcate the 
stack for each of tiiese plug-ins). This also enables the plug-ins to use the stack 
information to aid in vaUdation and filtering. For example an auto corrector plug-in may 
need to know the entire element Hst that is on die stack in order to figure out how to fix 
a problem. 

The use of filter/vaHdator plug-ins in mark-up language generators is especially useful for 
developers writing a cUent to the firameworit and generating maric-up documents as the 
same validator plug-in used by die parser can be used in die generator. Errors are 
reported to the cUent when the mark-up does not conform to the vaUdator which will 
enable the developer to make sure diey are writing well formed maric-up tiiat conforms 
to the DTD and catch error eariy on during development 

The mark-up framework incorporates a character conversion module tiiat enables 
documents written in different character sets (e,g, ASCII, various Kanji character sets 
etc.) to be parsed and converted to UTF8. This means a client obtains the results firom 
the parser in a generic way (UTF8) witiiout having to know die original character set tiiat 
was used in die document CHents hence no longer need to be able to differentiate 
between differ«it character sets and handle die different diaracter sets appropriately. 
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3. Genetic Data Supplier API 

The mark-up language parser or generator accesses data from a source using a generic 
data supplier API. Hence, the parser or generator is insulated firom having to talk direcdy 
to a data source; instead, it does so via a generic data supplier API, acting as an 
5 intermediary layer. This de-couples the parser or generator from the data source and 
hence means that the parser or generator no longer have to be hard coded for a specific 
data supplier. This in turn leads to a simplification of the parser and generator design. 

The present invention allows parsing and generation to be carried out with any data 
source. For example, a buffer in memory could be used, as could a file, as coxald 
lO stteixiihg from' a s^^^ S^^m daS 

streamed over the internet). There is no requirement to define, at parser/generator build 
time, what particular data source will be used. Instead, the system allows any source tiiat 
can use the generic data supplier API to be adopted. New types of data sources can be 
utilised by computing device, even after those devices have been shipped to end-users. 

15 

The present invention is implemented in a system called the Mark-Up Language 
Framework, used in SymbianOS from Symbian Limited, London, United Kingdom. 
SymbianOS is an operating system for smart phones and advanced mobile telephones, 
and other kinds of portable computing devices, 

20 

Appendix 1 describes the Mark-Up Language Framework in more detail. Appendix 2 
describes a particular technique, referred to as ^String Pool*, which is used in the Mark- 
Up Language Framework. The appendices refer to various SymbianOS specific 
programming techniques and structures. There is an extensive published literature 
25 describing tihiese techniques; reference may for example be made to "Professional 
Symbian Programming*' Wrox Press Inc. ISBk: 186100303X. 



Appendix 1 

Mark-up Framework Design Document vO.4 
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Introduction 
Futpose and Scope 

This document describes the architecture for a generic mark-up framework. The 
framework is extendable by using plug-ins so that mark-up parsers and generators (e.g. 
5 XML[1], WBXML[2] ) can be used. 

Design Overview 
Block Diagrams 

The mark-up framework block diagram is shown in Error! Reference source not 
found.. The Client is the application using the mark-up framework for parsing or 

generating a-doctm 

specific to a mark-up language (e.g. XML or WBXML). These components use the 
Namespace collection to retrieve information about a specific namespace duting the 
parsing or generating phase. 

The Namespace Plug-in component is an ECOM plug-in that sets-up all the elements, 
attributes and attribute values for a namespace. For each namespace used there must be a 
plug-in that describes the namespace. The namespace information is stored in a string 
pool. The string pool is a way of storing strings that makes comparison almost 
instantaneous at the expense of string creation. It is particulariy efficient at handling 
string constants that are known at compile time, which makes it very suitable for 
processing documents. The Namespace owns the string pool that the Parser, 
Generator and Client can gain access to. 

The Namespace Plug-in simply sets-up tiie string pool witii die required stcbogs for the 
namespace the plug-in represents. The Client may get access to die Namespace 
Collection via the Parser or Generator to pre-load namespaces prior to parsing or 
generating documents which may speed up the parsing or generating session. 

The Plug-in components (1 - 4) are optional and allow further processing of die data 
before the client receive it such as DTD validators or document auto cotrectors. 
30 Validators check the elements and attributes conform to the DTD. Document auto 
correction plug-ins are used to try to correct errors reported from DTD validators. 
The parser is event driven and sends events to the various plug-ins and UI during 
parsing. Error! Reference source not found, shows a client parsing with a DTD 
validator and auto corrector. The client talks to die parser direcdy to start die parse. The 
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parser sends events to the chain of plug-ins. The first plug-in that receives events is the 
DTD vaUdator plug-in. This plug-in vaUdates that the data in the event it received is 
correct. If it is not correct it wE. send the same event the parser sent to the vaUdator to 
die auto corrector except for a error code diat ^wU describe the problem the validator 
5 encountered. It the event data is vaUd the same event will be sent to the auto corrector.. 
Now the auto corrector receives the event and can check for any errors. If there is an 
error it can attempt to correct it K it can correct die error it will modify the data in the 
event and remove the error code before sending die event to the cUent. The cUent finally 
receives the event and can now handle it. 
10 Error! Reference source not found, illustrates a client generating using a DTO 
vaUdator and auto corrector plug-ins. A real cUent would probably iiever use a generator 
and auto corrector since the data the cUent generates should always be vaUd, but it is used 
here to show the flow of events firom a generator and any plug-ins attached. 

15 The cUent sends a build request to the generator. The first thing the generator does is to 
send die request as an event to the DTD vaUdator plug-in. The situation is similar to the 
parser, die DTD vaUdator plug-in vaUdates diat die data in die event it received is 
correct. If it is not correct it wiU send dac same event die parser sent to die vaUdator to 
die auto corrector except for a error code diat wiU describe die problem die vaUdator 

20 encountered. It the event data is vaUd die same event will be sent to die auto corrector. 
Now die auto corrector receives die event and can check for any errors. If tiiere is an 
error it can attempt to correct it. If it can correct die error it will modify die data in die 
event and remove die error code before sending die event back to die generator. The 
major difference between die events during parsing and generating is widi geherating, 

25 once die fmal plug-in has dealt widi die event it gets sent back to die generator. The 
generator receives die event and builds up part of die document using die details firom 
the event. 



30 Parsing and Generating WBXML, 

Parsing WBXML is quite different to parsing XML or HTML. The main difference is 
elements and attributes are defined as tokens radier dian using dieir text representation. 
This means a mapping needs to be stored between a WBXML token and its static string 
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representation. The Namespace plug-in for a particular namespace will store these 
mappings. A WBXML parser and generator can then obtain a string from the 
namespace plug-in given tiae WBXML token and vice versa. 

5 Class Diagram 

The class diagram for the mark-up framework is shown in Error! Reference source not 
found.. The diagram also depicts plug-ins diat makes use of the firamework. The green 
(or dark grey classes in b&w) are the plug-ins that provide implementation to liie mark- 
up framework, CXmlParser and CWbxmlParser provide an implementation to parse 

10 XML and WBXML documents respectively. In the same way CXmlGenerator and 

emxxnlGenerator generate 

a plug-in which will validate the mark-up document during parsing or generating. 

CAutoCorrector is a plug-in that corrects invalid mark-up documents. 

When parsing a document and the client receives events for the start of an element for 

15 example (OnStartElementL), the element RString in the event is a handle to a string in 
the string pool. If this is a known string, i.e. one that has been added by the Namespace 
Plug-in then the string will be static. Otherwise, if it is an unknown string, the parser will 
add the string to tiie string pool as a dynamic string and return a RString with a handle 
of this string. It is not possible to know if a RString is dynamic or static so the parser or 

20 generator that obtains a RString must be sure to close it to ensure any memory is 
released if the stting is dynamic. A client that wishes to use the RString after the event 
returns to the parser must make a copy of it which will increase the reference count and 
make sure it is not deleted when liie parser closes it 

25 Errorl Reference source not found, is an example class diagram that shows the major 
classes for parsing WBXML SyncML documents. The client creates a 
CDescriptorDataSupplier that supplies the data to the parser. CWbxmlParser is the 
class that actually parses the document CSyncMLNamespace is the namespace for 
SyncML that the parser uses to map WBXML tokens to strings. All the other classes 

30 belong to the mark-up framework. To parse a document with different namespaces the 
only thing that needs to be added is a plug-in for each namespace. 

Class Dictionary 
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Object name 


Bescdption 


Associated (owned/dependant) objects 


MMarkupCallback 


A call-back that a client must 
implement so that the parser can 
report events back to the client 
during the parsing session. 


Inherited by clients and plug-ins. 


RNamespaceCoUection 


Contains a collection of namespaces. 
Contains reference counter so multiple 
parsers or generators may use the same 
namespace collection. 


Owned by either CParserSession or 
CGeneratorSession. Owns an array oi 
CMarkupNamespace plug-ins. 


CMarkupNamespace 


ECOM interface to implement a 
namespace. 


Inherited by any namespace plug-ins. 


RPatserSession 


Public interface for a client to 
create a parser session. 


Owned by the client. 


RGeneratotSession 


Public interface for a client to 
create a generator session. 


Owned by the dient. 


CMarkupCharSetConveirter 


Helper function which uses 
CCnvCharactetSetConverter for die 
client, parser and generator to do any 
r'Vta^Qftf^T Q^f r*nnversions or resolving 
MIB Enums or Internet-standard names 
of character sets. 


Owned by RParserSession and 
RGeneratotSession. "."^ 


CMarkupPluginBase 


Generic interface for any type of plug-in. 


Inherited by CMarkupPlugin, 
CParserSession and CGeneratorSession. 


CMatkupPlugin 


ECOM interface for plug-ins to be 
used by the parser and generator. 


Owned by CParserSession or 
CGeneratorSession. 


MDataSupplierReadet 


Pure virtual interfece to be implemented 
by .a data supplier for reading data.- 


Inherited by the client's data 
provider. 


MDataSuppHerWriter 


Pure virtual interfsicc to be implemented 
by a data supplier for -writing data. 


Inherited by the client's data 
provider. 


CParserSession 


ECOM interface for parser plug-ins. 


Inherited by a concrete parser 
implementation. 


CGeneratorSession 


ECOM interface for generator 
plug-ins. 


Inherited by. a concrete generator 
implementation. 


• RAttxibute 


Contains the name and value of an 
attribute. 


' Used by. the parse, generator and 
client. 
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The classes below are not part of the ficamewotk but illustrate how the firamework can be used. 



CValidator 


A DTD, schema or some other 
type of validator. 


Owned by RParserSession ot 
RGenetatorSes sion. 


CAutoCotrector 


Used to auto correct invalid data. 


Owned by RParserSession or 
RGeneratorSession. 


CXmlParser 


An XML parser implementation. 


Owned by RParserSession. 


CWbxmlParser 


A WBXML parser implementation. 


Owned by RParserSession. 


CXmlGenerator 


An XML generator 
implementation. 


Owned by RGeneratorSession. 


CWbxmlGenerator 


A WBXML generator 

implementation. 


Owned by RGeneratorSession. 


CNamespace 


A namespace plug-in to use with a 
parser and generator. 


Owned by 
RNamespaceCollection. 


RElementStack 


A stack of the currentiy processed 
elements during parsing or 
generating. 


Owned by CParserSession and 
CGeneratorSession. 



Detailed Design 
5 RParserSession 



The foUowing is the public API for this class: 



Method 


Description 


void OpenL( 

MDataSupplierReader& aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumentMimeType, 

MMarkupCallback& aCallback) 

* • 


Opens a parser session. 

aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is die MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aCallback is a reference to the call-back so die parser can 
report events. 


void OpenL( 


Opens a parser session. 
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MDataSupplierReader& aReader, 

const TDesC8& 

aMaikupMimeType, 

const TDesC8& 

aDocumentMmieType, 

MMarkupCa]lback& aCallback, 

KMaiKUpl'iugins ax luguib; 


aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMinieType is the MIME type of die 
document to parse. 

aCallback is a reference to the call-back so the parser can 
report events. 

aPlugins is an array of plug-ins to use with die parser. The 
first plug-in in the list is the first plug-in to be called back 
from the parser. The first plug-in ^vill then call-back to die 
second plug-in etc. 


void OpenL( 

MDataSupplieiReader& aReader, 

const TDesC8& 

aMatkupMimeType, 

const TDesCBSc 

aDocumentMimeType, 

MMarkupCallback& aCallback, 

RMarkupPlugins aPluginsQ, 

RNamespaceCollection 

aNamespaceCoUection) 


Opens a parser session. 

aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is die MIME type of die parser to 

open. \: 
aDocumentMimeType is the MIME type of the 

document to parse. ' * 7 
aCallback is a reference to die call-back so die parser caa 
report events. 

aPlugins is an array of plug-ins to use vnth the parser. The 
first plug-in in die list is die first plug-in to be called back 
from die parser. The first plug-in will dien call-back to die 
second plug-in etc. 

aNamespaceCoUection is a handle to a previous 
namespace coUection. This is useful if a generator or 
another parser session has been created so that same 
namespace collection can be shared. 


void GloseQ 


Qoses die parser session. 


void StartO • 


Start parsing the document. 


void StopO 


Stop parsing die document. 


void Reset( 

MDataSupplie£Reader& aReader, 
MlV[arkupCallback& aCallback) 


Resets the parser ready to parse a new document 
aReader is the data supplier reader to use during parsing. 
aCallback is a reference to the call-back so the parser can 
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report events. 


Tint SetParseMode( 
Tint aParseMode) 


Selects one or more parse modes. 

aParseMode is one or more of the following: 

EConvertTagsToLowerCase — Converts elements and 
attributes to lowercase. This can be used for case- 
insensidve HTML so diat a tag can be matched to a 
static stdng in the string pool. 

EErrorOnUnrecognisedTags - Reports an error 
when unrecognised tags are found. 

EReportUnrecognisedTags - Reports unrecognised 


- 


'tags:' 

EReportNamespaces — Reports the namespace. 
EReportNamespacePrefixes — Reports the namespace 
prefix. 

ESendFuUContentlnOneChunk — Sends all content 
data for an element in one chunk. 

EReportNameSpaceMapping — Reports namespace 
mappings via the DoStartPrefixMappingO & 
DoEndPrefixMappingO methods. 

If this function is not called the default will be: 
EReportUnrecognisedTags | EReportNamespaces 

If the parsing mode is not supported KErrNotSupported is 
returned. 






RGenetatorSession 

The following is the public API for this class: 


Method ~ ^"^ " 


Description " ~ | 


void OpenL( 

MDataSupplierWriter& aWriter, 


Opens a generator session. 

aWriter is the data supplier writer used' to generate a 
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TUid aMarkupJVumexype, 
const TD esC 8 &' 
aDociimentMimeType) 


QOC U n 1 CI i L» 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME type of the 
dociiment to parse. 


void OpenL( 

MDataSupplierWriter& aWriter, 
TUid aMarkupMimeType, 
const ijjesi^ooc 
aDocumentMimeType, 
RMatkupPliagins aPluginsQ) 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
docviment. 

aMarkuoMimeType is the MIME type of the generator to 
Open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aPlugins is an array of plug-ins to use with the generator. 


void OpenL( 

MDataSupplierWtiter& aWiiter, 
TUid aMarkupMimeType, 
const TDesC8& 
aDocumentMimeType, 
RMarkupPlugins aPluginsQ, 
RNamespaceCoUection 
aNamespaceCollection) 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
document. 

aMarkupMimeType is the MIME type of the generator to 
open. 

aDocumentMimeType is the MIME type of the 
document to parse. 

aPluoins is an array of plug-ins to use with the generator. 
aNamespaceCollection is a handle to a previous 
namespace collection. This is useful if a generator or 
another parser session has been created so that same 
namespace collection can be shared. 


void QoseQ 


Closes the generator session. 


void Reset( 

MDataSupplierWriterSc aWriter, 
MMarkupCallbackSc aCallback) 


■RMetq the venerator ready to generate a new document. 
aWriter is die data supplier writer used to generate a 
• document. ' •• 
aCaUback is a reference to the caU-back so the generator 

can report events. 


void BuildStartDocumentL( 
RDocvimentPatameters 


Builds the start of the document 

aDocParam specifies the vaiious parameters of the 
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aDocParam); 


document. In the case of WBXML this would state the 
public ID and string table. 






void BuildStartElementL( 
RTaoTnfn& aEIement 
Jt^^ixnoutc/itrayoc 3J\xtn.oxktC8) 


Builds the start element with attributes and namespace if 

aElement is a handle to the element's details. 
aAttributes contains the attributes for the element. 


void BiiildEndElementL( 
RTagInfo& aElement) 


Builds the end of the element. 

aElement is a handle to the element's details. 


VOICI 15UiLQV^OnLcnuL«^ 


Builds part or all of the content. Large content should be 


' "const TOesC aContenti?art) 


built in chunks. I.e. this function should be called many 
times for each chunk. 

aBytes is the taw content data. This data must be. converted 

to the correct character set by the client. 


voiQ ^iiuojricnxiYLa.ppiog.L^^ 
RString& aPrefix, 
RStringSc aUtQ 


ouuQs a prenx — ujvi. namespace ror une nexr eiemenr ro dc 
built. This method can be called for each namespace that 
needs to be declared. 

AJt JLCJLIJk Id LUC X>l aJLUCaL7aL>C pxCiJA PCilJLK UCUcLLCLl. 

aUti is the Namespace URI the prefix is mapped to. 


void BuadProcessingInstructioiLL( 
RStxingSc aTarget, 
RStxing& aData) 


Bxuld a processing instmction. 

aTarget is the processing instmction target. 

aData is the processing instmction data. 


RTaglnfo 

The following is the public API for this class: 


Method 


Description 1 


void Open( 
RString& alJti, 
RS^ing& aPr«fix, 
RStting& aLocalName) 


Sets the tag information for an element or attribute. 
aUti is die URI of the namespace. 
aPrefix is the prefix of die qualified nam<^ 
aLocaE^ame is die local name of die qualified name. 


void CloseQ 


Closes die tag information. 


RString& UriO 


Returns die URI. 


RString& LocalNameO 


Returns die local name. 
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RStriag& PrefeQ 1 


Returns the prefix. j 


RNamespaceCoUection 

The following is the public API for this class: - 


Method 


-Description | 


void ConnectO 


livery time tnis metJioci is caiica a rcici:cjj.t.c ^uuutc^i. ao 
incremented so that the namespace collection is only 
destroyed when no clients are using it. 


void CloseQ 


Jivery time tjiis metiioa is caiicci o. xciciciiuc; ui^uixl^j- 
decremented and trie ODjeci is acbtxuycu oxny wiicix u-xi... 
rererence c-ijuxii.ci lo z»ci.»j. 


const CMarkupNameSpace& 

OpenNamespaceL( 

const TDesC8& aMimeType) 


Opens a namespace plug-in and returns a reference to the 
namespace plug-in. If the namespace plug-in is not loaded it 
-will be automatically loaded. 

aMimeType is die MIME type of the plug-in to open. 


const CMarkupNameSpace& 

OpenNamespaceL( 

TUintS aCodePage) 


Opens a namespace plug-in and returns a reference to the 
namespace plug-in. 

aCodePage is the code page of tihe plxig-in to open. 


void ResetQ 


Resets tiie namespace collection and string pool. ; 


RStringPool StringPoolQ 


Returns a handle to the string pool object 


CMatkupNamespace 

The following is the API for this class: 


1 Method 


Description 


void . NewL(RStringPool 
aStringPooI) 


Creates tte namespace plug-in. 

aStringPool is a handle of die string pool to add static 
string tables. 


RString& Element( 

TUintS aXSIbxrolToken) const 


Returns a handle to die string. 

aWbxmlToken is the WBXML token of die element 


void AttributeValuePair( 
TUintS aWbxmlToken 
RString& aAtttibute, 
RStringSc aValue) const 


Returns a handle to tie attribute and value strings. 
aWbxmlToken is die WBXML token of die attribute. 
aAtttibute is die handle to the attribute string. 
aValue is the handle to the value string. 
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RStniig& AtmbuteValue( 
TUintS aWbxmlToken) const 


Returns a handle to an attribute value. 
aWbxmlToken is the WdXMJL token or tne attobute. 




RString& NamespaceUriO const 


Returns the namespace name. 




TUintS CodePageQ const 


Returns the code page for this namespace. 


RTableCodePage 

The following is the API for this dass: 




Method 


Description 




RString NameSpaceUriO 


Returns the namespace URI for this code page. 




Tint StringPoolIndexFromToken( 


Gets a StringPool index from a token value. —1 is returned if 




Tint aToken); 


die item is not found. 




Tint TokenFromSttingPooIIndex( 
Tint aindex); 


Gets a token value from a StringPool index. —1 is returned if 

the item is not foimd. 



5 CMarkupPluginBase 

The following is the API for this ECOM class: 



Method 


Description 


CMarkupPluginBase& RootPlugtnQ 


Returns a reference to the root plug-in. This must be eidier 
a parser or generator plug-in. 


CMarkupPluginBasefic 
ParentPluginO 


Returns a reference to the Parent plug-in. 


REIementStack& ElementStackQ 


Returns a handle to the element stack. 


RNameSpaceCollection& 
NamespaceCoUectionQ 


Returns a handle to die namespace collection. 


CMarkupCharSetConverter& 
CharSetConverterO 


Returns a reference to the character set converter object 


TBool IsChildElementValid( 
RString& aParentElement, 
RStnng& aChildElement) 


Checks if the aChildElement is a valid child of 
aParentElement 



CMarkupPlugin 

The following is the API for diis ECOM class: 



Method 



Description 
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CMaJclcupPlug^* NewL( 
MMarkupCa]lback& aCallback) 


Creates an instance of a maii-up plug-in, 

aCallback is a reference to the call-back to report events. 


void SetParent( 

CMarkupPluginBase* 

aParentPlugin) 


Sets the parent plug-in for this plug-in. 
aParentPlugin is a pointer to the parent plug-in or NULL 
if there is no parent. A parser or generator. does not have a 
parent so this must not be set, as the default NULL will 
indication there is not parent. 



CParserSession 

The following is the API for this ECOM class: 



Method 



CParserSession* NewL( 

MDataSupplierReader& aReader, 

const TDesC8& 

aMarkupMimeType, 

const TDesC8& 

aDocumcQtMimcType, 

MMarkupCallback& aCallback, 

RNamespaceCollection* 

aNamespaceCoUection, 

CMarkupCharSetConverter& 

aCharSetConverter) 



void StartO 



Description 



Opens a parser session. 
aReader is the data supplier reader to use during parsing. 
aMarkupMimeType is the MIME type of the parser to 
open. 

aDocumentMimeType is the MIME type of the 

document to parse. ^'t 
aCallback is a reference to the call-back so the parser;^an 

report events. 

aNamespaceCoUection is a handle to a previous 
namespace collection. Set to NULL if a new 
RNamespaceCollection is to be used, 
aCharSetConverter is a reference to the character set 

conversion class. 
Start parsing the document. 



void StopO 



void Reset( 

MDataSupplierReaderSc aReader, 
MMarkupCallback& aCallback) 

void SetParseMode( 
Tint aParseMode) 



Stop parsing the docximent. 



Resets the parser ready to parse a new document 
aReader is the data supplier reader to use during parsing, 
aCallback is a reference to the call-back so the parser c^ 
report events. 

Selects one or more parse modes. 
See RParserSession for details on aParseMode. 
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CGeneratorSession 

The foUowiag is the API for this ECOM class: 



1 Method 


Description 


void OpenL( 

MDataSupplierWriter& aWriter, 
TUid aMarkupMimeType, 
const TDesC8& 
aDocumentMimeType, 
MMarkupCallback& aCallback, 

' "aNamegjpaceCdUection; 
CMarkupCharSetConverter& 
aCharSetConvertet) 


Opens a generator session. 

aWriter is the data supplier writer used to generate a 
document. 

aMaxkupMimeType is the MIME type of tiie generator to 
open. 

aDocumentMimeType is the MIME type of the 

dncuinent to oapse 

"aC aUb'ackls a" reference" t6"tKe call^acTE ^oUTe "generator' 
can report events. 

aNamespaceCoUection is a handle to a previous 
namespace collection. Set to NULL if a new 
RIsTamespaceCollection is to be used. 
aCharSetConverter is a reference to the character set 
conversion class. 


void Reset( 

MMariaipCallback& aCallback) 


Resets the generator ready to generate a new document. 
aWriter is the data supplier writer used to generate a 
document. 

aCallback is a reference to die call-back so the generator 
can report events. 


void BiiildStartDoo^t-n en^^ 

RDocumentParameters 

aDocParam); 


Builds the start of the document. 

aDocParam specifies the various parameters of the 
document. 


void BiaildEndDocumentLO 


Builds the end of the document. 


void BiiiIdStartElementL( 
RTagInfo& aElement, 
RAttributeAtray& aAttributes) 


BuUds the start element with attributes and namespace if 
specified. 

aElement is a handle to the element's details. 
aAttributes contains the attributes for the element. 


void BuildEndElementL( 
RTaglnfoSc aElement) 


Builds the end of the element 

aElement is a handle to the element's details. 


void BvdldContentL( 


Biiilds part or all of the content. Large content should be 
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const TDesC8& aContentPatt) 


built in chianks. I.e. this function should be called many 
times for each chunk. 

aBytes is the raw content data. This data must be converted 
to the correct character set by the client. 


void BuildProcessingInstructioiiL( 
RStxing& aTarget, 
RString& aData) 


Build a processing instruction. r 

aTarget is the processing instruction target. 
aData is the processing instruction data. 


BAttribute 

The following is the API for this class: 


Method 


Description 1 


RTagInfo& AttributeQ 


Returns a handle to the attnoute s name qckuis. 


TAttxibuteType TypeQ 


Returns the attribute's type. Where TAttributeType is one 

of the following enumeration: 

CDATA 

ID ,1 

IDREF 

IDREFS 

NMTOKEN ^< 

NMTOKENS 

ENTITY 

ENTITIES 

NOTATION 


RString& VaiueQ 


Returns a handle to the attribute value. If the attribute value 
is a Ust of tokens (IDREFS, ENTITIES or NMTOKENS), 
the tokens will be concatenated into a single RString with 
each token separated by a single space. 


MDataSupplierReader 

The foUoTving is the API for this mix-in class: 


1 Method 


Description 
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TUintS GetByteLO 


Get a single byte ficom die data supplier. 


const TDesC8& GetBytesL( 
Tint aNumberOfBytes) 


Gets a descriptor of size aNumberOfChars. If the number 
of bytes is not available this method leaves with KErrEof, 
The returned descriptor must not be deleted until another 
call to GetBytesL or EndTransactionLO is made. 


void StartTransactionLO 


The parser calls this to indicate die start of a transaction. 


void EndTtasactionLO 


The parser calls diis to indicate the transaction has ended. 
Any data stored for the transaction may now be deleted. 


void RoUbackLO 


The parse calls this to indicate die transaction must be roUed 
back to the exact state as when StartTransactionLO was 
"oaLfleci. 


MDataSupplierWriter 

The following is the API for this mix-in class: 


Method 


Description 


void PutByteL( 
TUintS aByte) 


Put a byte in the data supplier- 


void PutBytesL( 

const TDesC8& aBytes) 


Puts a descriptor in die data supplier. 


MMarkupCallback 

The following is the API for this mix-in class: 


Method 


Description 


void OnStartDocumentL( 

RDocumentParameters 

aDocParam, 

Tint aErrorCode); 


Callback to indicate die start of the document. 
aDocParam specifies the various parameters of the 
docximent. 

aErrorCode is die error code. If this is riot KErrNone dien 
special action may be required. 


void OnEndDodumentL( 
TlntaErrorCode); 


Indicates die end of die document has been reached 
aErrorCode is die error code. If this is not BErrNone dien 
special action may be required. 


void OnStartElementL( 
RTaglnfoSc aEIement, 


Callback to indicate an element has been parsed, 
aEIement is a handle to the dementis details. 
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RA.ttributeArray& aAttributes, 
Hat aEtrorCode); 


aAttributes contaias the attributes for the element. 
aErrorCode is the error code. If this is aot KErrNone thea 
special actioa may be reqxaired. 


void OnEndElementL( 
RTagInfo& aElement, 
Hat aEirorCode); 


Callback to indicate the ead of the element has been 
reached. 

aElement is a haadle to the element's details. 

aErrorCode is the error code. If this is aot KErrNone thea 

special action may be required. 


void OnCoatentL( 
coast TDesC8& aBytes, 
Tint aErrorCode) 


Sends the content of the element. Not all die content may 
be returned ia one go. The data may be sent in chuoks. 
When an OnEndEIementL is received this means diere is 
no more content to be sent. 

aBytes is the raw coatent data for die element. The client is 
responsible for converting the data to die required character 
set if necessary. Ia some iastaaces with WBXML opaque 
date the coatent may be biaary and must not be converted. 
aErrorCode is the error code. If diis is not BErrNonei^dien 
special actioa may be required. " 


void OnStortPrefixMappingL( 
RSttdngSc aPtefix, 
RStrine:& aUri, 
Tint aEirorCode) 


Notificatioa of the begianing of die scope of a prefix-URI 
Namespace mapping. This method is always called bigfore 
the correspoadiag OaStartElementL mediod. 
aPrefix is die Namespace prefix being declared 
aUri is die Namespace URI die prefix is mapped to. 
aErrorCode is the error code. If diis is aot KErrNone tiien 
special actioa may be required. 


void OnEadPrefixMappingL( 
RSt£iag& aPrefix, 
Tint aErrorCode) 


Notification of die end of the scope of a prefix-URl 
mapping. This method is called after die corresponding 
DoEadElementL method. 

aPrefix is the Namespace prefix that was mapped. 
aErrorCode is the error code. If diis is not KErrNone dien 
special actioa may be required. 


void OnIgaoreableWhiteSpaceL( 
" coast TDesC8& aBytes, 


Notificatioa of igaorable whitespace la elemeat coatent. 
aBytes 'are the ignored bytes from die document being 
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TEnt aErrorCode) 


parsed, 

aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnSkippedEntityL( 
TTnt aErrorCode) 


Notification of a skipped entity. If die parser encomters an 

external entity it does not need to expand it — it can return 

the entity as aName for the client to deal witii, 

aName is die name of die skipped entity. 

aErrorCode is the error code. If this is not KErrNone tiien 

special action may be required. 


void OiiProcessingInstructionL( 

const TDesC8& aData, 
Tint aErrorCode) 


Receive notification of a processing instmction. 
" aTSget~iVthe"pf 6c^^ target. 
aData is the processing instruction data. If empty none was 
supplied. 

aErrorCode is the error code. If this is not KErrNone then 
special action may be required. 


void OnOutOfDataLO 


There is no more data in the data supplier to parse. If there 
is more data to parse StartQ should be called once there is 
more data in the supplier to continue parsing. 


void OnError(TInt aError) 


An error has occvirred where aError is the error code 



Sequence Diagrams 

Setting up, parsing and generating 

Error! Reference source not found, shows die interaction of die client witia the various 
5 parser objects to create a parser and generator session. The parsing of a simple document 
with oxily one element and generation of one element is shown. It is assumed a DTD 
validator and auto correct component are used. Auto correction in this example is only 
used with die parser. The generator only checks diat tags are DTD compliant but does 
not try to correct any DTD errors, 

10 

Element not valid at current level in DTD 

Auto correction is left up to the plug-in implementers to decide how and what shoxjld be 
corrected. 
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The sequence diagram in Figure 4 shows an example of what is possible with the case 
where the format of the document is vaKd, however, there is a invaUd element (Q that 
should be at a different level as shown in an example document below. 



<A> Content 
<B> 

<C> // Not valid for fbe DTD, should be a root elment. 

Some content 
</C> 

</B> 

</A> 

/ / <C> should go here 



The bad element is detected by the DTD vaUdator and sent to the auto correct 
15 component. The auto corrector reaUses that this element has an error from the error 
code passed in the call-back and tries to find out where the element should go, and send 
back the appropriate OnEndElementLQ call-backs to the cUent " 

Scenarios _ 
20 Set-up a parser to parse WBXML without any phig-ins. 

Scenario to parse the following document: 
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<A> 



<B> 

Content 

</B> 



<A> 
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1.. The client creates a data supplier tihat contains the data to be parsed. 
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2. The client creates an RParserSession passing in the data supplier, MIME t3^e for 
WBXML, the MIME type of the document to be parsed and the call-back pointer 
where parsing events are to be received. 

3. The client begins the parsing by calling StartQ on the parser session. 

4. The parser makes the following call-backs to the client 

OnStertDocumentLQ 
OnStartElementLCA') 
OnStartElementL('B') 
OnContentCContent^ 
OnEndElementL(30 

'OnEndElementaLfA^ 

OnEndDocumentL,0 

Set-up a parser to parse WBXML with a validator plug-in 
The same document as 5.1 is used in this scenario. 

1 . The client creates a data supplier that contains die data to be parsed. 

2. The client constructs a RMarkupPlugins object with the UID of a validator. 

3. The client creates an RParserSession passing in die data supplier, MIME type for 
WBXML, the MIME type of the document to be parsed, call-back pointer where 
parsing events are to be received and the array of plug-ins object, 

4. The parser session first iterates dirough the array of plug-ins starting from die end of 
the list. It creates the CValidator BCOM object setting die call back to die dient. 
The CWbxmlParser ECOM object is created next and its call-back is set to the 
CValidator object. This sets up the chain of call-back events from the parser 
through to the validator and then the client. The validator needs access to data from 
the parser so SetParent needs to be called on all the plug-ins in the array. The 
validator sets its parent to the parser object 

5. The cUent begins the parsing by calling StartQ on die parser session. 

6. The parser makes die following call-backs to die client 

OnStartDocumentLQ 
OnStartElementiLCA') 
• OnStartElemendLCB7 ' ...... 

OnContent(*Content') 
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OnEndElementi^'B') 
OnEndElementLCA') 
.OnEndDocumentLO 

Generating a \)5^XML document xvith a DTD vaUdator 

The document in 5.1 is to be generated in this scenario. 

1 . The client creates a data supplier with an empty buffer. 

2. The cUent constructs a RMarkupPlugins object with the UID of a validator. 

3. The dient creates a RParserGenerator passing in the data suppUer, MIME type for 
WBXML, MIME type of the document to be parsed and the array of plug-ins object 

4. The generator session first iterates through die array of plug-ins starting from the end 
of die list It creates die CVaUdator ECOM object setting die caU back to die dient 
The CWbxmlGenetator ECOM object is created next and its call-back is set to die 
CVaUdator object This sets up die chain of call-back events ftom die generator 
dirou^ to die vaUdator and dien die dient The vaUdator needs access to data from 
die parser so SetParent needs to be caUed on all die plug-ins in die array. The 
validator sets its parent to the parser object 

5. The dient dien calls the following mediods: 

BuildStardDocumentLO 

BuildStartElementLCA') 

BvuldStartElementLCB") 

BuildContend:.('ContentO 

BuildEndElementLCB*) 

BuildEndElementLCA') 



Design Considerations 

. ROM/RAM Memory Strategy - die string pool is used to minimise dupHcate strings. 
, Error condition handling ^ errors are returned bad. to plug-ins and die ^ent via die 
call-back API. 

. Localisation issues - documents can use any diaracter set and die diaracter set is 
returned back to die dient in die case of parsing so it knows how to deal widi die 
■ data. For a generator die dient can set die character set of die document 
. Performance considerations - die string pool makes string comparisons effident 
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• Platform Security — in normal usage the parser and generator do not need any 
capabilities. However, if a plug-in were designed to load a DTD ficom the Internet it 
would require PhoneNetwork capabilities. 

• Modularity — all components in the framework are ECOM components that can be 
5 replaced or added to in the future. 

Testing 

The data supplier and parser generator set-up components can be tested individually - all 
the functions are synchronous and therefore no active objects need to be created for 
testing. 

10 . . 

The following- steps' can 'be-camed DUt"tD-l:esrpaifSiftg^w^ 

XML: 



1 . Load a pre-created file. 
15 2. Parse the file. 

3. Generate a buffer from the output of tlie parser. 

4. Compare the output of the biiffer with tiie original pre-created file to see if they 
match. 



20 Additional tests are carried out to test error conditions of parsing, such as badly 
formatted documents and cormpt documents. 
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open Issues 

The foUowing issues need to be resohred before this document is completed: 

1. If a plug-in requires capabiUties to connect to the Internet what capabilities does the 

framework need? 

2. The API for CMarkupCharSetConvertot and RDocumentParametets needs to 

be decided. 

Glossary ,.» 
The following technical terms and abbreviations are used witidn this document. 



XML 



Extensible Markup Language 



WBXML 



WAP Binary Eartensible Maricup Langnage 



SAX 



Simple API for XML 



DOM 



Document Object Model 



Element 



This is a tag enclosed by angle brackets. E.g <Name>, <Address>, <Phone> etc 



Attributes 



Values 



Content 



These are the attributes assodated with an dement. E.g. <Phone Type="Mobile»> The 
attribute here is 'Type". 



Itese are the amal value of an attribute. E.g. <Phone Type="Mobilc»> The value here is 
"Mobile" ■ . 



■ms is the acnud content for an element. E.g. <Name>Symbian</Name>. Here 
"SymWan" is the content for the dement "Name". 
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Tenn 


Definition 




code page can have 32 elements. 


XSLT 


Extensible Style-sheet Language Tiansfonnations 


SOAP 


Simple Object Access Protocol 


URI 


Uniform Resource Identifiers 


qualified name 


A qualified name specifies a prefix : local name e.g. *HTML:B' 


prefix 


From the qualified name example this is 'HTML' 


local name 


From the qualified name example this i$ 'B' 
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Appendix A - <Auto correction examples> 

Table Al shows a situation where the end tags are the wrong way round for A and B. 
This is very easy to fix since the DTD validator keeps a stack of the tags, it knows what 
the end tag should be. 



10 



<A>Content 
<B> 

More content 
</A> 

</B> 



TabkAl: End tags that are the tvrong way round 
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Table A2 shows the situation where the B end tag is missing. Since tiie end tag does not 
match a guess can be made that ihere should be an end tag for B before the end tag of A- 
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<A>Cpntent 
<B> 

Mote content 

</A> 



Tabk A2: Missing end tag 
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Table A3 shows the situation where diere are no end tags for A and B. The DTD 
vaUdator will detect the problem and send an end tag for B to die dient. The auto correct 
coinponent will query the DTD validator if the C tag is vaUd for die parent element A. If 
it is vaUd a OnStartELementLO ^viIl be sent to die client, otixerwise tixe auto correct 
component can check furdier tq, the element stack to find where this element is vaUd. If 
it is not valid anywhere in the slack then it will be ignored togetiier with any content and 
end element tag. 



<A>Content 
<B> 
. More content 
<C> 

Some content 
</C> 



Tabk A3: Missing end tags 
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Appendix B - How to write a namespace plug-in 

The tables below show the WBXML tokens for the example namespace. Tables 1 to 3 
each represent a-static string table. Tables 1 shows the elements for code page 0. Tables 2 
and 3 are for attribute value pairs respectively. Each attribute index on Table 2 refers to 
the values of the same index in Table 3. These token values must match up in Tables 2 
and 3. If an attribute does not have a value then there must be a blank as shown in Table 
3 with token 8. For attribute values, diese also appear in Table 3 but have a WBXML 
token value of 128 or greater. 



Element type name WBXML 

token 
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Addr 


5 


AddType 


6 


Audi 


7 


AuthLevel 


8 



Attribute name /value pair 
(attribute part) 


WBXML 
token 


TYPE 


6 


TYPE 


7 


NAME 


8 


NAME 


9 


Table 2i AttributeValuePairNameTable, code page 0 


Attribute name/ value pair 
(value part) 


WBXML 
token 


ADDRESS 


6 


URL 


7 • 




8 


BEARER 


9. - 


GSM/CSD 


128 
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GSM/SMS 


129 


GSM/USSD 


130 



Table 3: AtttibuteValuePairValueTable, code page 0 



The following string table files (.st) are created for each table: 



# Element table for code page 0 






stringtable ElementCodePageO 






EAddr Addr 






EAddT^'e Add^ 






EAuth Auth 






EAuthLevel AuthLevel 
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String table for Table 1 
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# Attributes table for code page 0 
stringtable AttnbutesCodePageO 
EType Type 
EType Type 
EName Name 
EName Name 



20 String table for Table 2 
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# Attribute values table for code page 0 
stringtable AtttibuteValuesCodePageO 
EAddress Address 
EURLURL 
EBearer BEARER 
EGSM^CSD GSM/CSD 
EGSML.SMS GSM/SMS 
EGSM^USSD GSM/USSD 



30 String Table for Table 3 
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<Example usage of API > 

Below shows an example of how to setting up the parser and generator with DTD 
5 checking and auto correction. 

RNfarkupPltigins plvigins; 
plugins.Append(I<MyValidator); 
pluginsAppend(KMyAutoCorrector}; 
10 CDescriptorDataSuppner*dataSuppUer = CDescriptorDataSuppUerriNewLCQ; 

RParserSession parser; 

parser.OpenL(dat:aSuppUer, MarkupMimeType, DocumentMimeType, caUback, plugins); 
parserJarseQ; 

/ / Callback events will be received 
15 parser.CloseO; 

1 1 Now consttuct a genctator using the same plug-ins and data suppHcr 
RGeneratorSession generatoi^ 

generator.OpenL(dataSuppUer, MarkupMimeType, DocumentMimeType, caUbacfc, 

20 plugins); 

generator.BuildStartDocumentL05 

RAttributeAttay attributes; 

/ / Get an RString firom the ElementSttingTable 

RString s4ing=generator.StringPoolO.String(mementSa^^^ ElementStringTable); 

25 // Build one element with content 

generator3uildStartElementL(string, attributes); 

generator3undContentL(JL8C*Tbis is die content^')); 

gencrator.BuildEndElementL(string); 

generator^uildEndDocumentLQ; 
30 generator.aoseO; 
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Appendix 2 

How the Stdng Pool is used to parse both text and binary mark-up language 

The Mark-up Language framework design relies on the fact that it is possible (using the 
'String Poor techniques described below) to provide the same interface to clients no 
matter if text or binary mark-up language is used. 

Text based mark up languages use strings, i.e. sequences of characters or binary data. In 
the String Pool technique, static tables of these strings are created at compile time, with 
— one stringtable.per namespuce; for" all- the"elemetits, amibut^T and-^atmbXffe- valu^^^ 
needed to describe a particular type of mark-up document. Each element, attribute and 
attribute value is assigned an integer number and these integer Tiandles' form an index of 
the strings. A string in an XML document can be rapidly compared to all strings in the 
string table by the efficient process of comparing the integer representation of the string 
with all of the integer handles in the static string table. The main benefit of using a string 
pool for parsing is therefore that it makes it very easy and efficient for the client to check 
for what is being parsed, since handles to strings are used instead of actual strings. This 
means only integers are compared rather than many characters, as would be the normal 
case if string pools were not used. Also, comparisons can be carried out in a simple 
switch statement in the code, making the code efiBcient, and easier to read and maintain. 
Hence, the string* pool is used to make string comparisons efficient at the expense of 
creation of the strings. 

For binary mark-up language (e.g. WBXML) the situation is more complex since there 
are no strings in WBXML, In WBXML, everything is tokerxised (i.e. given a token 
number). We get aroxmd the absence of strings as follows: a table of mappings of each 
of the WBXML tokens to the index of the string in the string table is created (see Figure 
8). Each mapping is given a vmique integer value - a handle. Since it is reqmred to map 
from tokens to strings and vice versa, two lists of integer value handles are created: one 
indexed on tokens and the otiier indexed on the index of the position in the string table. 
This is so that it is quick to map firom one type to the other. All this is encapsxalated in 
die namespace plug-in and therefore is insulated firom the client, parser and generator. 
The client can therefore parse a binary or text document without having to know about 
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the specific format - it simply uses the integer haadle (RString). which wiU work 
correctly for boda text and binary mariE-\ip languages. 




Genenc Passer 
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CLAIMS 

1. A portable computing device programmed with a client that interfaces with a 
5 mark up language parser or a generator via an intermediary layer that (a) insulates the 

client from having to communicate directly with the parser or generator and is (b) 
generic in that it presents a common API to the client irrespective of the specific kind of 
parser or generator the intermediary layer interfaces with. 

2. The portable computing device of Claim 1 in which the intermediary layer 
'* rO' bperates'as an~SxtehsiBl^ ancl die parser and l£e generator are each plug-ins 

to this framework. 

3. The portable computing device of Claim 1 or 2 in which the client interacts widi 
several kinds of parsers or generators, via the intermediary layer, each handling dififerent 
mark-up language formats. 

15 4. The portable computing device of any preceding Claim in which one mark up 
language format is WBXML and there is provided a mapping of WBXML tokens to a 
stdng in a static string pool table. 

5, The portable computing device of any preceding Claim programmed with a file 
conversion capability requiring a source file to be parsed by the parser, which is adapted 

20 to handle one format and for an output, converted file to be generated by the generator, 
adapted to handle a different file format 

6. The portable computing device of any preceding Claim in which several different 
clients are able to share the same parsers or generators. 

25 7-. A method of parsing a mark-up language document, in which a client interfaces 
with a mark up language parser via an intermediary layer that (a) insulates the client firoro 
having to communicate directly with die parser and is (b) generic in that it presents a 
common API to the client irrespective of the specific kind of parser the intermediary 
layer interfaces with. 
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B. . A method of generating a mark-up language document, in which a dient 
interfaces with a mark up language generator via an intermediary layer that (a) insulates 
the cUent ftom having to communicate directly wi^ the generator and is (b) generic in 
that it presents a common API to the client irrespective of die specific kind of generator 
the intermediary layer interfaces -with. 
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ABSTRACT 

GENERIC HANDLING BY A CLIENT OF DIFFERENT MARK UP 
LANGUAGE PARSERS AND GENERATORS 

A portable computing device is programmed with a client that interfaces with a mark up 
language parser or a generator via an intermediary layer tiiat (a) insulates the client from 
having to coxximunicate directly with the parser or generator and is (b) generic in that it 
presents a common API to the client irrespective of the specific kind of parser or 
generator the interaiediary layer interfaces with.^ 

In this way, the client is no longer tied to a single kind of parser or generator; it can 
operate with any different kind of parser compatible with the intermediary layer, yet it 
remains far simpler than prior art clients that are hard-coded to operate direcdy with 
several different kinds of parsers and generators. 
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Fig 2: Block diagram of a dieat parsing using a DTD validator and auto corrector. 
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Fig 7: Sequence diagram showing DTD vaKdation and auto cottecdon 
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